Sorted Neighborhood for Schema-free RDF Data
نویسندگان
چکیده
Entity Resolution (ER) concerns identifying pairs of entities that refer to the same underlying entity. To avoid O(n) pairwise comparison of n entities, blocking methods are used. Sorted Neighborhood is an established blocking method for Relational Databases. It has not been applied to schema-free Resource Description Framework (RDF) data sources widely prevalent in the Linked Data ecosystem. This paper presents a Sorted Neighborhood workflow that may be applied to schema-free RDF data. The workflow is modular and makes minimal assumptions about its inputs. Empirical evaluations of the proposed algorithm on five real-world benchmarks demonstrate its utility compared to two state-of-the-art blocking baselines.
منابع مشابه
Sorted Neighborhood for the Semantic Web
Entity Resolution (ER) concerns identifying logically equivalent entity pairs across databases. To avoid Θ(n) pairwise comparisons of n entities, blocking methods are used. Sorted Neighborhood is an established blocking method for relational databases. It has not been applied on graph-based data models such as the Resource Description Framework (RDF). This poster presents a modular workflow for...
متن کاملExpLOD: Exploring Interlinking and RDF Usage in the Linked Open Data Cloud
The Linking Open Data community project is promoting the creation of interlinked RDF datasets with links between data items identified using dereferenceable URIs. This promising direction for publishing data on the web brings forward a number of issues. A key challenge is to understand the data, the schema, and the interlinks that are actually used both within and across linked datasets. Unders...
متن کاملAn Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملFunctional Queries to Wrapped Educational Semantic Web Meta-Data
The aim of the Edutella project is to provide a peer-to-peer infrastructure for educational material retrieval using semantic web meta-data descriptions of educational resources. Edutella uses the semantic web meta-data description languages RDF and RDF-Schema for describing web resources. The aim of this work is to wrap the Edutella infrastructure with a functional mediator system. This makes ...
متن کاملObject-Oriented RuleML: User-Level Roles, URI-Grounded Clauses, and Order-Sorted Terms
This paper describes an Object-Oriented extension to RuleML as a modular combination of three sublanguages. (1) User-level roles provide frame-like slot representations as unordered argument collections in atoms and complex terms. (2) URI-grounded clauses allow for ‘webizing’ using URIs as object identifiers for facts and rules. (3) Ordersorted terms permit typed variables via Web links into ta...
متن کامل